Knowledge Concentration: Learning 100K Object Classifiers in a Single CNN
نویسندگان
چکیده
Fine-grained image labels are desirable for many computer vision applications, such as visual search or mobile AI assistant. These applications rely on image classification models that can produce hundreds of thousands (e.g. 100K) of diversified fine-grained image labels on input images. However, training a network at this vocabulary scale is challenging, and suffers from intolerable large model size and slow training speed, which leads to unsatisfying classification performance. A straightforward solution would be training separate expert networks (specialists), with each specialist focusing on learning one specific vertical (e.g. cars, birds...). However, deploying dozens of expert networks in a practical system would significantly increase system complexity and inference latency, and consumes large amounts of computational resources. To address these challenges, we propose a Knowledge Concentration method, which effectively transfers the knowledge from dozens of specialists (multiple teacher networks) into one single model (one student network) to classify 100K object categories. There are three salient aspects in our method: (1) a multi-teacher single-student knowledge distillation framework; (2) a self-paced learning mechanism to allow the student to learn from different teachers at various paces; (3) structurally connected layers to expand the student network capacity with limited extra parameters. We validate our method on OpenImage and a newly collected dataset, Entity-Foto-Tree (EFT), with 100K categories, and show that the proposed model performs significantly better than the baseline generalist model.
منابع مشابه
Application of ensemble learning techniques to model the atmospheric concentration of SO2
In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...
متن کاملLearning Document Image Features With SqueezeNet Convolutional Neural Network
The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...
متن کاملLarge-Scale R-CNN with Classifier Adaptive Quantization
This paper extends R-CNN, a state-of-the-art object detection method, to larger scales. To apply R-CNN to a large database storing thousands to millions of images, the SVM classification of millions to billions of DCNN features extracted from object proposals is indispensable, which imposes unrealistic computational and memory costs. Our method dramatically narrows down the number of object pro...
متن کاملClassification of Brain Tumor by Combination of Pre-Trained VGG16 CNN
In recent years, brain tumors become the leading cause of death in the world. Detection and rapid classification of this tumor are very important and may indicate the likely diagnosis and treatment strategy. In this paper, we propose deep learning techniques based on the combinations of pre-trained VGG-16 CNNs to classify three types of brain tumors (i.e., meningioma, glioma, and pituitary tumo...
متن کاملDeep-learning Versus OBIA for Scattered Shrub Detection with Google Earth Imagery: Ziziphus lotus as Case Study
There is a growing demand for accurate high-resolution land cover maps in many fields, e.g., in land-use planning and biodiversity conservation. Developing such maps has been traditionally performed using Object-Based Image Analysis (OBIA) methods, which usually reach good accuracies, but require a high human supervision and the best configuration for one image often cannot be extrapolated to a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1711.07607 شماره
صفحات -
تاریخ انتشار 2017